{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sentiment Analysis with Logistic Regression\n", "\n", "This gives a simple example of explaining a linear logistic regression sentiment analysis model using shap. Note that with a linear model, the SHAP value of feature $i$ for the prediction $f(x)$ (assuming feature independence) is just $\\phi_i = \\beta_i \\cdot (x_i - E[x_i])$. Since we are explaining a logistic regression model, the units of the SHAP values will be in the log-odds space.\n", "\n", "The dataset we are using is the classic IMDB dataset from [this paper](http://www.aclweb.org/anthology/P11-1015). When explaining the model, it is interesting to observe how the words that are absent from the text are sometimes just as important as those that are present." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": "